AMENDMENTS TO THE CLAIMS 

1 . (Currently Amended) A language model generation and accumulation apparatus 
that generates and accumulates language models for speech recognition, the apparatus 
comprising: 

a lower-level N-gram language model generation and accumulation unit operable to generate 
and accumulate a lower-level N-gram language model that is obtained by modeling a first sequence 
of words having a specific linguistic property; and 

a higher-level N-gram language model generation and accumulation unit operable to generate 
and accumulate a higher-lever N-gram language model that is obtained by modeling the first 
sequence of words modeled in the lower-level N-gram language model as a word string class and a 
plurality of text as a second sequence of words that includes the word string class 

a high e r l e vel N gram languag e mod e l g e neration and accumulation unit op e rabl e to 
g e n e rat e and accumulat e a high e r l e v e l N gram languag e model that is obtain e d by modeling 
e ach of a plurality of t e xts as a se qu e nc e of words that includ e s a word string class having a 
sp e cific linguistic prop e rty; and 

a low e r l e v e l N gram languag e mod e l g e n e ration and accumulation unit op e rabl e to 
g e n e rat e and accumulat e a low e r l e v e l N gram languag e mod e l that is obtain e d by mod e ling a 
s e qu e nc e of words within the word string class . 

2. (Original) The language model generation and accumulation apparatus according to 
Claim 1, 

wherein the higher-level N-gram language model generation and accumulation unit and 
the lower-level N-gram language model generation and accumulation unit generate the respective 
language models, using different corpuses. 

3. (Original) The language model generation and accumulation apparatus according to 
Claim 2, 

wherein the lower-level N-gram language model generation and accumulation unit 
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includes 

a corpus update unit operable to update the corpus for the lower-level N-gram language 
model, and 

the lower-level N-gram language model generation and accumulation unit updates the 
lower-level N-gram language model based on the updated corpus, and generates the updated 
lower-level N-gram language model. 

4. (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 1, 

wherein the lower-level N-gram language model generation and accumulation unit 
analyzes the first sequence of words within the word string class into one or more morphemes 
that are thejsmallest language units having meanings, and generates the lower-level N-gram 
language model by modeling each sequence of said-the one or more morphemes in d e p e nd e ncy 
on said based on the word string class. 

5. (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 1, 

wherein the higher-level N-gram language model generation and accumulation unit 
substitutes the word string class with a virtual word, and then generates the higher-level N-gram 
language model by modeling a sequence made up of saidthe virtual word and the-other words, 
said-the word string class being included in each of the plurality of texts analyzed into 
morphemes. 

6. (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 1, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

an exception word judgment unit operable to judge whether or not a specific word out of 
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the a plurality of words that appear in the word string class should be treated as an exception 
word, based on a linguistic property of said-the specific word, and divides the exception word 
into (i) a syllable that is a basic phonetic unit constituting a pronunciation of sak khe exception 
word and (ii) a unit that is obtained by combining syllables based on a judgment result-e£said 
judgm e nt , said-the exception word being a word not being included as a constituent word of the 
word string class, and 

the language model generation and accumulation apparatus further comprises 
a class dependent syllable N-gram generation and accumulation unit operable to generate 
class dependent syllable N-grams by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables and by providing a language likelihood to said -the sequence in 
dependency on either the word string class or the linguistic property of the exception word, and 
accumulate-said the g enerated class dependent syllable N-grams, said the language likelihood 
being a logarithm value of a probability. 

7. (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 1, further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well as 
syntactic analysis of a text, and generate a syntactic tree in which said-the text is structured by a 
plurality of layers, focusing on a node that is on said the syntactic tree and that has been selected 
on the basis of a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model for syntactic tree, using a first subtree that 
constitutes an upper layer from the focused node, and 

the lower-level N-gram language model generation and accumulation unit generates the 
lower-level N-gram language model for syntactic tree, using a second subtree that constitutes a 
lower layer from the focused node. 

8. (Currently Amended) The language model generation and accumulation apparatus 
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according to Claim 7, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

a language model generation exception word judgment unit operable to judge a specific 
word appearing in the second subtree as an exception word based on a predetermined linguistic 
property, saidthe exception word being a word not being included as a constituent word of any 
subtr ee s subtree , and 

the lower-level N-gram language model generation and accumulation unit generates the 
lower-level N-gram language model by dividing the exception word into (i) a syllable that is a 
basic phonetic unit constituting a pronunciation of saidthe word and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables in dependency on a location of the exception word in the 
syntactic tree and on the linguistic property of saidthe exception word. 

9. (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 1, further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well as 
syntactic analysis of a text, and generate a syntactic tree in which saidthe text is structured by a 
plurality of layers, focusing on a node that is on saidthe syntactic tree and that has been selected 
on th e basis of based on a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model, using a first subtree that constitutes a highest 
layer of the syntactic tree, and 

the lower-level N-gram language model generation and accumulation unit categorizes 
each subtree constituting a layer lower than a second layer based on a positioning of said each 
subtree when included in the upper layer, and generates the lower-level N-gram language model 
by use of each of the categorized subtrees. 
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1 0. (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 9, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

a language model generation exception word judgment unit operable to judge, as an 
exception word, a specific word appearing in any subtr ee s subtree in a layer lower than the 
second layer based on a predetermined linguistic property, saklthe exception word being a word 
not being included as a constituent word of any subtree subtre e s , and 

the lower-level N-gram language model generation and accumulation unit divides the 
exception word into (i) a syllable that is a basic phonetic unit constituting a pronunciation of 
saidthe word and (ii) a unit that is obtained by combining syllables, and generates the lower-level 
N-gram language model by modeling a sequence made up of the syllable and the unit obtained by 
combining syllables in dependency on a position of the exception word in the syntactic tree and 
on the linguistic property of sakkhe exception word. 

1 1 . (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 1, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model in which each sequence of N words including 
the word string class is associated with a probability at which said each sequence of Nwords 
occurs. 

12. (Currently Amended) The language model generation and accumulation apparatus 
according to Claim 1, 

wherein the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model by associating each of an N -long chain of 
words constituting the word string class with a probability at which said each of the N-long chain 
of words occurs. 
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13. (Currently Amended) A speech recognition apparatus that recognizes a speech 
which is a sequence of uttered words, using the following: 

a lower-level N-gram language model that is obtained by modeling a first sequence of words 
having a specific linguistic property; and 

a higher-level N-gram language model that is obtained by modeling the first sequence of 
words modeled in the lower-level N-gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class 

a high e r l e v e l N - gram languag e mod e l that is obtain e d by mod e ling e ach of a plurality of 
texts as a s e qu e nc e of words that includ e s a word string class having a sp e cific linguistic 
prop e rty; and 

a low e r l e v e l N gram languag e mod e l that is obtain e d by mod e ling a s e qu e nc e of words 
within th e word string class . 

1 4. (Currently Amended) A speech recognition apparatus that recognizes a sequence 
of uttered words, comprising! 

a languag e mod e l g e n e ration and accumulation apparatus that g e n e rat e s and accumulat e s 
languag e models for sp ee ch r e cognition, 

wher e in th e languag e mod e l g e n e ration and accumulation apparatus includ e s: 

a high e r l e v e l N gram languag e mod e l g e n e ration and accumulation unit op e rabl e to 
g e nerat e and accumulat e a high e r l e v e l N gram languag e mod e l that is obtained by mod e ling 
e ach of a plurality of t e xts as a s e qu e nc e of words that includ e s a word string class having a 
sp e cific linguistic prop e rty; and 

a lower lev e l N gram languag e mod e l g e n e ration and accumulation unit op e rabl e to 
generat e and accumulat e a low e r l e v e l N gram language mod e l that is obtained by mod e ling a 
s e qu e nc e of words within th e word string class 

a lower-level N-gram language model generation and accumulation unit operable to generate 
and accumulate a lower-level N-gram language model that is obtained by modeling a first sequence 
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of words having a specific linguistic property; and 

a higher-level N-gram language model generation and accumulation unit operable to generate 
and accumulate a higher-lever N-gram language model that is obtained by modeling the first 
sequence of words modeled in the lower-level N-gram language model as a word string class and a 
plurality of text as a second sequence of words that includes the word string class , and 

the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that is accumulated by the higher-level N-gram language model generation and 
accumulation unit and the lower-level N-gram language model that is accumulated by the lower- 
level N-gram language model generation and accumulation unit. 

15. (Currently Amended) The speech recognition apparatus according to Claim 14, 
wherein the higher-level N-gram language model generation and accumulation unit and 

the lower-level N-gram language model generation and accumulation unit generate the respective 
language models, using different corpuses, and 

the speech recognition apparatus recognizes-the speech by use of the higher-level N-gram 
language model and the lower-level N-gram language model that hav e be e n respectively buik 
constructed using the different corpuses. 

1 6. (Currently Amended) The speech recognition apparatus according to Claim 15, 
wherein the lower-level N-gram language model generation and accumulation unit 

includes 

a corpus update unit operable to update the-acorpus for the lower-level N-gram 
language model, 

the lower-level N-gram language model generation and accumulation unit updates the 
lower-level N-gram language model based on the updated corpus, and generates the updated 
lower-level N-gram language model, and 

the speech recognition apparatus recognizes the speech by use of the updated lower-level 
N-gram language model. 
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1 7. (Currently Amended) The speech recognition apparatus according to Claim 14, 
wherein the lower-level N-gram language model generation and accumulation unit 

analyzes the-a^sequence of words within the word string class into one or more morphemes that 
are the_smallest language units having meanings, and generates the lower-level N-gram language 
model by modeling each sequence of saidthe one or more morphemes in dep e ndency based on 
saidthe word string class, and 

the speech recognition apparatus recognizes the speech by use of the lower-level N-gram 
language model that has been modeled as the sequence of saidthe one or more morphemes. 

1 8. (Currently Amended) The speech recognition apparatus according to Claim 14, 
wherein the higher-level N-gram language model generation and accumulation unit 

substitutes the word string class with a virtual word, and then generates the higher-level N-gram 
language model by modeling a sequence made up of saidthe virtual word and the-other words, 
saidthe word string class being included in each of the plurality of texts analyzed into 
morphemes, and 

the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that has been modeled as the sequence made up of the virtual word and the other 
words. 

1 9. (Currently Amended) The speech recognition apparatus according to Claim 1 8, 
wherein the lower-level N-gram language model generation and accumulation unit 

includes 

an exception word judgment unit operable to judge whether or not a specific word out of 
the a plurality of words that appear in the word string class should be treated as an exception 
word, based on a linguistic property of saidthe specific word, and divides the exception word into 
(i) a syllable that is a basic phonetic unit constituting a pronunciation of sai d the exception word 
and (ii) a unit that is obtained by combining syllables based on a result of saidthe judgment. 
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saidthe exception word being a word not being included as a constituent word of the word string 
class, 

the language model generation and accumulation apparatus further comprises 
a class dependent syllable N-gram generation and accumulation unit operable to generate 
class dependent syllable N-grams by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables and by providing a language likelihood to saidthe sequence in 
dependency on either the word string class or the linguistic property of the exception word, and 
accumulate saidthe generated class dependent syllable N-grams, saidthe language likelihood 
being a logarithm value of a probability, and 

the speech recognition apparatus recognizes the speech by use of the class dependent 
syllable N-grams. 

20. (Currently Amended) The speech recognition apparatus according to Claim 19, 
wherein the language model generation and accumulation apparatus further comprises 

a syntactic tree generation unit operable to perform morphemic analysis as well as 
syntactic analysis of a text, and generate a syntactic tree in which saidthe text is structured by a 
plurality of layers, focusing on a node that is on saidthe syntactic tree and that has been selected 
on the basis of a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model for syntactic tree, using a first subtree that 
constitutes an upper layer from the focused node, and 

the lower-level N-gram language model generation and accumulation unit generates the 
lower-level N-gram language model for syntactic tree, using a second subtree that constitutes a 
lower layer from the focused node, and 

the speech recognition apparatus comprises: 

an acoustic processing unit operable to generate feature parameters from the speech; 
a word comparison unit operable to compare a pronunciation of each word with each of 
the feature parameters, and generate a set of word hypotheses including an utterance segment of 
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said each word and an acoustic likelihood of said each word; and 

a word string hypothesis generation unit operable to generate a word string hypothesis 
from the set of word hypotheses with reference to the higher-level N-gram language model for 
syntactic tree and the lower-level N-gram language model for syntactic tree, and generate a result 
of the speech recognition. 

2 1 . (Currently Amended) The speech recognition apparatus according to Claim 20, 
wherein the lower-level N-gram language model generation and accumulation unit 

includes 

a language model generation exception word judgment unit operable to judge a specific 
word appearing in the second subtree as an exception word based on a predetermined linguistic 
property, saidthe exception word being a word not being included as a constituent word of any 
subtree subtr ee s , 

the lower-level N-gram language model generation and accumulation unit generates the 
lower-level N-gram language model by dividing the exception word into (i) a syllable that is a 
basic phonetic unit constituting a pronunciation of saidthe word and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the unit 
obtained by combining syllables in dependency on a location of the exception word in the 
syntactic tree and on the linguistic property of saidthe exception word, and 

the word string hypothesis generation unit generates the result of the speech recognition. 

22. (Currently Amended) The speech recognition apparatus according to Claim 14, 

wherein the language model generation and accumulation apparatus further comprises 
a syntactic tree generation unit operable to perform morphemic analysis as well as 
syntactic analysis of a text, and generate a syntactic tree in which saidthe text is structured by a 
plurality of layers, focusing on a node that is on saidthe syntactic tree and that has been selected 
on the basis of a predetermined criterion, 

wherein the higher-level N-gram language model generation and accumulation unit 



12 



generates the higher-level N-gram language model, using a first subtree that constitutes a highest 
layer of the syntactic tree, 

the lower-level N-gram language model generation and accumulation unit categorizes 
each subtree constituting a layer lower than a second layer based on a positioning of saidthe each 
subtree when included in the upper layer and generates the lower-level N-gram language model 
by use of each of the categorized subtree, and 

the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that has been generated using the first subtree and the lower-level N-gram 
language model that has been generated using said each subtree constituting a layer lower than 
the second layer. 

23. (Currently Amended) The speech recognition apparatus according to Claim 22, 

wherein the lower-level N-gram language model generation and accumulation unit 
includes 

a language model generation exception word judgment unit operable to judge, as an 
exception word, a specific word appearing in any subtr ee s subtree in a layer lower than the 
second layer based on a predetermined linguistic property, saidthe exception word being a word 
not being included as a constituent word of any subtr ee s subtree , 

the lower-level N-gram language model generation and accumulation unit divides the 
exception word into (i) a syllable that is a basic phonetic unit constituting a pronunciation of 
saidthe word and (ii) a unit that is obtained by combining syllables, and generates the lower-level 
N-gram language model by modeling a sequence made up of the syllable and the unit obtained by 
combining syllables in dependency on a position of the exception word in the syntactic tree and 
on the linguistic property of saidthe exception word, and 

the speech recognition apparatus recognizes the speech by use of the higher-level N-gram 
language model that does not include the exception word and the lower-level N-gram language 
model that includes the exception word. 
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24. (Currently Amended) The speech recognition apparatus according to Claim 14, 

wherein the higher-level N-gram language model generation and accumulation unit 
generates the higher-level N-gram language model in which each sequence of N words including 
the word string class is associated with a probability at which saidthe each sequence of words 
occurs, and 

the speech recognition apparatus comprises 

a word string hypothesis generation unit operable to evaluate a word string hypothesis 
by multiplying each probability at which saidthe each sequence of N words including the word 
string class occurs. 

25. (Currently Amended) The speech recognition apparatus according to Claim 14, 

wherein the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model by associating each N-long chain of words 
constituting the word string class with a probability at which saidthe each chain of words occurs, 
and 

the speech recognition apparatus comprises 

a word string hypothesis generation unit operable to evaluate a word string hypothesis 
by multiplying each probability at which saidthe each sequence of N words inside the word string 
class occurs. 

26. (Currently Amended) A language model generation method for generating 
language models for speech recognition, comprising: 

a lower-level N-gram language model generation and accumulation step for generating and 
accumulating a lower-level N-gram language model that is obtained by modeling a first sequence of 
words having a specific linguistic property; and 

a higher-level N-gram language model generation and accumulation step for generating and 
accumulating a higher-lever N-gram language model that is obtained by modeling the first sequence 
of words modeled in the lower-level N-gram language model as a word string class and a plurality of 
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text as a second sequence of words that includes the word string class 

a high e r - l e v e l N gram language model g e neration and accumulation st e p of g e n e rating 
and accumulating a high e r l e v e l N gram languag e mod e l that is obtain e d by mod e ling e ach of a 
plurality of t e xts as a s e qu e nc e of words that includ e s a word string class having a sp e cific 
linguistic prop e rty; and 

a low e r l e v e l N gram languag e mod e l gen e ration and accumulation st e p of g e n e rating and 
accumulating a low e r l e v e l N gram languag e mod e l that is obtain e d by modeling a sequ e nc e of 
words within th e word string class . 

27. (Currently Amended) A speech recognition method for recognizing a speech 
which is a sequence of uttered words, using the following: 

a lower-level N-gram language model that is obtained by modeling a first sequence of words 
having a specific linguistic property; and 

a higher-level N-gram language model that is obtained by modeling the first sequence of 
words modeled in the lower-level N-gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class 

a high e r lev e l N gram languag e mod e l that is obtain e d by modeling e ach of a plurality of 
t e xts as a s e qu e nc e of words that includ e s a word string class having a sp e cific linguistic 
prop e rty; and 

a low e r l e v e l N gram languag e mod e l that is obtain e d by mod e ling a s e quenc e of words 
within th e word string class . 

28. (Currently Amended) The speech recognition method according to Claim 27, 
further comprising x omprising 

a step of categorizing each word string having a specific linguistic property as a word 
string class, and providing, to saidthe each word string, a language likelihood which is a 
logarithm value of a probability, by use of class dependent word N-grams that are obtained by 
modeling saidthe word string class in dependency on saidthe word string class based on a 
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linguistic relationship between words constituting saidthe word string class; 

a step of analyzing a text into a word and the word string class, and providing, to a 
sequence of saidthe word and the word string class, a language likelihood which is a logarithm 
value of a probability, by use of class N-grams that are obtained by modeling saidthe sequence of 
the word and the word string class based on a linguistic relationship; and 

a step of (i) comparing features parameters extracted from a series of speeches with a 
pronunciation as well as an acoustic characteristic of each word and generating a set of word 
hypotheses including an utterance segment of saidthe each word and an acoustic likelihood of 
saidthe each word, (ii) generating a word string hypothesis from saidthe set of word string 
hypotheses with reference to the class N-grams and the class dependent word N-grams, and (iii) 
outputting a result of the speech recognition. 

29. (Currently Amended) A program for performing a language model generation 
method that is intended for generating a language model for speech recognition, the program 
causing a computer to execute the following steps: 

a lower-level N-gram language model generation and accumulation step for generating and 
accumulating a lower-level N-gram language model that is obtained by modeling a first sequence of 
words having a specific linguistic property; and 

a higher-level N-gram language model generation and accumulation step for generating and 
accumulating a higher-lever N-gram language model that is obtained by modeling the first sequence 
of words modeled in the lower-level N-gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class 

a high e r - l e vel N gram language mod e l g e n e ration and accumulation st e p of g e n e rating 
and accumulating a higher lev e l N gram languag e mod e l that is obtain e d by modeling e ach of a 
plurality of t e xts as a s e qu e nc e of words that includ e s a word string class having a sp e cific 
linguistic prop e rty; and 

a low e r l e v e l N gram languag e model g e neration and accumulation step of g e n e rating and 
accumulating a lower l e v e l N gram languag e mod e l that is obtain e d by mod e ling a s e quenc e of 
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words within th e word string class . 

30. (Currently Amended) A program for performing a speech recognition method that 
is intended for recognizing a sequence of uttered words, the program causing a computer to 
execute a speech recognition step that is performed by use of the following: 

a lower-level N-gram language model that is obtained by modeling a first sequence of words 
having a specific linguistic property; and 

a higher-level N-gram language model that is obtained by modeling the first sequence of 
words modeled in the lower-level N-gram language model as a word string class and a plurality of 
text as a second sequence of words that includes the word string class 

a high e r l e v e l N gram languag e mod e l that is obtain e d by mod e ling e ach of a plurality of 
t e xts as a s e qu e nc e of words that includ e s a word string class having a specific linguistic 
prop e rty; and 

a low e r l e v e l N gram languag e mod e l that is obtain e d by mod e ling a s e qu e nc e of words 
within the word string class . 



17 



